Goto

Collaborating Authors

 level accuracy


PoseGaze-AHP: A Knowledge-Based 3D Dataset for AI-Driven Ocular and Postural Diagnosis

Al-Dabet, Saja, Turaev, Sherzod, Zaki, Nazar, Khan, Arif O., Eldweik, Luai

arXiv.org Artificial Intelligence

Diagnosing ocular - induced abnormal head posture (AHP) requires a comprehensive analysis of both head pose and ocular movements. However, existing datasets focus on these aspects separately, limiting the development of integrated diagnostic approaches and r estricting AI - driven advancements in AHP analysis. T o address this gap, we introduce PoseGaze - AHP, a novel 3D dataset that synchronously captures head pose and gaze movement information for ocular - induced AHP assessment. Structured clinical data were extra cted from medical literature using large language models (LLMs) through an iterative process with the Claude 3.5 Sonnet model, combining stepwise, hierarchical, and complex prompting strategies. The extracted records were systematically imputed and transfo rmed into 3D representations using the Neural Head Avatar (NHA) framework. The dataset includes 7,920 images generated from two head textures, covering a broad spectrum of ocular conditions. The extraction method achieved an overall accuracy of 91.92%, dem onstrating its reliability for clinical dataset construction. PoseGaze - AHP is the first publicly available resource tailored for AI - driven ocular - induced AHP diagnosis, supporting the development of accurate and privacy - compliant diagnostic tools .


Graph Convolutional Neural Networks to Model the Brain for Insomnia

Monteiro, Kevin, Nallaperuma-Herzberg, Sam, Mason, Martina, Niederer, Steve

arXiv.org Artificial Intelligence

Insomnia affects a vast population of the world and can have a wide range of causes. Existing treatments for insomnia have been linked with many side effects like headaches, dizziness, etc. As such, there is a clear need for improved insomnia treatment. Brain modelling has helped with assessing the effects of brain pathology on brain network dynamics and with supporting clinical decisions in the treatment of Alzheimer's disease, epilepsy, etc. However, such models have not been developed for insomnia. Therefore, this project attempts to understand the characteristics of the brain of individuals experiencing insomnia using continuous long-duration EEG data. Brain networks are derived based on functional connectivity and spatial distance between EEG channels. The power spectral density of the channels is then computed for the major brain wave frequency bands. A graph convolutional neural network (GCNN) model is then trained to capture the functional characteristics associated with insomnia and configured for the classification task to judge performance. Results indicated a 50-second non-overlapping sliding window was the most suitable choice for EEG segmentation. This approach achieved a classification accuracy of 70% at window level and 68% at subject level. Additionally, the omission of EEG channels C4-P4, F4-C4 and C4-A1 caused higher degradation in model performance than the removal of other channels. These channel electrodes are positioned near brain regions known to exhibit atypical levels of functional connectivity in individuals with insomnia, which can explain such results.


Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications

Rahman, Ben

arXiv.org Artificial Intelligence

Semantic segmentation has made significant strides in pixel-level image understanding, yet it remains limited in capturing contextual and semantic relationships between objects. Current models, such as CNN and Transformer-based architectures, excel at identifying pixel-level features but fail to distinguish semantically similar objects (e.g., "doctor" vs. "nurse" in a hospital scene) or understand complex contextual scenarios (e.g., differentiating a running child from a regular pedestrian in autonomous driving). To address these limitations, we proposed a novel Context-Aware Semantic Segmentation framework that integrates Large Language Models (LLMs) with state-of-the-art vision backbones. Our hybrid model leverages the Swin Transformer for robust visual feature extraction and GPT-4 for enriching semantic understanding through text embeddings. A Cross-Attention Mechanism is introduced to align vision and language features, enabling the model to reason about context more effectively. Additionally, Graph Neural Networks (GNNs) are employed to model object relationships within the scene, capturing dependencies that are overlooked by traditional models. Experimental results on benchmark datasets (e.g., COCO, Cityscapes) demonstrate that our approach outperforms the existing methods in both pixel-level accuracy (mIoU) and contextual understanding (mAP). This work bridges the gap between vision and language, paving the path for more intelligent and context-aware vision systems in applications including autonomous driving, medical imaging, and robotics.


Segmentation-free Connectionist Temporal Classification loss based OCR Model for Text Captcha Classification

Khatavkar, Vaibhav, Velankar, Makarand, Petkar, Sneha

arXiv.org Artificial Intelligence

Captcha are widely used to secure systems from automatic responses by distinguishing computer responses from human responses. Text, audio, video, picture picture-based Optical Character Recognition (OCR) are used for creating captcha. Text-based OCR captcha are the most often used captcha which faces issues namely, complex and distorted contents. There are attempts to build captcha detection and classification-based systems using machine learning and neural networks, which need to be tuned for accuracy. The existing systems face challenges in the recognition of distorted characters, handling variable-length captcha and finding sequential dependencies in captcha. In this work, we propose a segmentation-free OCR model for text captcha classification based on the connectionist temporal classification loss technique. The proposed model is trained and tested on a publicly available captcha dataset. The proposed model gives 99.80\% character level accuracy, while 95\% word level accuracy. The accuracy of the proposed model is compared with the state-of-the-art models and proves to be effective. The variable length complex captcha can be thus processed with the segmentation-free connectionist temporal classification loss technique with dependencies which will be massively used in securing the software systems.


#306: Microlocation, with David Mindell

Robohub

David discusses a system they developed that can detect the location of a special tracking device down to a centimeter level accuracy. They are currently developing a device to detect location down to a millimeter level accuracy. This solves a the core problem of localization for robots. David co-founded Humatics with a mission to revolutionize how people and machines locate, navigate and collaborate. He is a professor of Aeronautics and Astronautics at MIT, as well as the Dibner Professor of the History of Engineering and Manufacturing, and Chair of the MIT Task Force on the Work of the Future.


Broadband DOA estimation using Convolutional neural networks trained with noise signals

Chakrabarty, Soumitro, Habets, Emanuël. A. P.

arXiv.org Machine Learning

ABSTRACT A convolution neural network (CNN) based classification method for broadband DOA estimation is proposed, where the phase component of the short-time Fourier transform coefficients of the received microphone signals are directly fed into the CNN and the features required for DOA estimation are learned during training. Since only the phase component of the input is used, the CNN can be trained with synthesized noise signals, thereby making the preparation of the training data set easier compared to using speech signals. Through experimental evaluation, the ability of the proposed noise trained CNN framework to generalize to speech sources is demonstrated. In addition, the robustness of the system to noise, small perturbations in microphone positions, as well as its ability to adapt to different acoustic conditions is investigated using experiments with simulated and real data. Index Terms-- source localization, convolution neural networks, supervised learning, DOA estimation 1. INTRODUCTION Many applications such as hands-free communication, teleconferencing, and distant speech recognition require information on the location of a sound source in the acoustic environment.